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SUMMARY 


Thi-e  paper  is  a  summary  of  some  applications  of  the 
theory  of  dynamic  programming  tc  various  classes  of 
multi-stage  decision  problems  of  stochastic  type.  ^  ^ 
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§!•  Introduction 


A  large  number  of  problems  of  theoretical  and  practical  Impor¬ 
tance  reduce  to  the  computation  of  the  maximum  of  a  function  of  the 
form 


(1)  F(xj,  x*# 


subject  to  a  series  of  constraints  of  the  form 

(2)  x*»  •••»  xn)  ^  °»  "  1#  7%  •••*  •* 

If  n  is  a  number  of  even  m  derate  size,  prosaic  computational 
techniques  are  utterly  unavailing,  and  the  use  of  unadorned  calculus 
la  equally  fruitless,  and  sometimes  even  Illegitimate. 

If  the  functions  <$«  and  R,  are  linear,  numerical  results,  and 

^  o 

sometimes  even  analytic  results,  may  be  obtained  using  various  ver¬ 
sions  of  the  elegant  "simplex"  technique  devised  by  0.  Dantzlg.  If 
some  of  the  functions  are  non— linear,  the  theory  of  non-linear  pro¬ 
gramming,  as  conceived  by  Kuhn  and  Tucker  and  others,  must  be 
Invoked.  Naturally,  tne  computational  road  Is  not  as  smooth  as  In 
the  linear  case. 


If  the  functions  above  possess  certain  features  of  symmetry, 
or  If,  In  particular,  F(xj,  x*,  ...»  xn)  represents  the  "return" 
of  a  multi-stage  process,  tne  theory  of  dynamic  programming  Is  often 
useful  as  we  have  shown  In  a  number  of  papers,  cf.  [l],  [2]  »  [3]* 

[*].  [5],  pj»  C7L  [$j »  [9]*  In  general,  the  more  non— linear  the 
problem,  tne  more  useful  the  technique  of  this  theory. 
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Por  problems  of  deterministic  type,  such  as  those  posed  above, 
we  have  at  least  a  choice  of  the  techniques  we  may  wish  to  employ. 
However,  if  we  consider  multi-stage  decision  processes  of  stochastic 
type,  where  the  functions  and  coefficients  may  be  stochastic,  then, 
in  general,  there  seems  to  be  little  alternative  to  some  variation 
of  the  functional  equation  approach. 

Use  of  the  functional  equation  technique  Is,  as  pointed  out 
above,  only  profitable  when  the  problem  possesses  certain  symme- 
trlcal  features.  The  general  linear  programming  problem  of  deter¬ 
mining  the  expected  value  of  the  maximum  of 

(3)  F(x)  -  ^  a1x1 

where  the  x^  are  subject  to  constraints  of  the  form 

(4)  (a)  xi  >  0 

(b)  ^  cnxJ  *  v  1 ' m* 

where  the  c^j  and  e^  are  stochastic  parameters  subject  to  given 
probability  distributions,  seems  at  the  moment  still  to  be  outside 
mathematical  ken. 

In  this  paper  we  shall  consider  some  multi-stage  declsl-n 
processes  of  stochastic  type  which  are  particularly  suited  to  the 
techniques  of  the  theory  of  dynamic  programming.  Five  processes 
of  stochastic  type,  concerning  a  wide  range  of  topics,  will  be 


P—589 


treated  In  outline,  and  then  a  deterministic  maximization  problem, 
inserted  to  shot:  how  a  problem  of  this  type  can  profitably  be  con— 
sldered  as  a  multi-stage  decision  problem. 

Some  applications  cf  these  ideas  to  the  calculus  of  variations 
and  theory  of  integral  equations  may  be  found  in  fto]  ,  ftl]  ,  and 

55]  • 


In  the  section  following  the  presentation  of  the  problems  we 
shall  present  some  basic  terminology  of  the  theory  of  dynamic  pro¬ 
gramming.  Following  that  we  shall  discuss  the  "Principle  of 
Optimality",  which  we  utilize  to  obtain  functional  equations  for 
the  determination  of  optimal  policies  in  the  above  decision  pro¬ 
cesses. 

We  shall  then  indicate  the  use  of  successive  approximations 
in  determining  numerical  and  analytic  solutions  and  the  application 
of  the  concept  of  "approximation  in  policy  space"  to  obtain  monotone 
convergence . 

$  2 .  Representative  Multi-Stage  Decision  Processes  of  Stochastic  Type . 

Let  us  now  consider  some  representative  processes  which  we 
shall  show  below  may  be  treated  by  the  methods  of  dynamic  programming. 

$2.1  Optimal  Allocation 

Over  a  period  of  n  years,  it  is  necessary,  at  the  beginning  of 
each  year,  to  order  some  equipment  to  perform  certain  tasks.  We 
possess  an  initial  amount  of  money  x  which  is  to  be  divided  into 
two  parts,  y  and  x-y.  The  first  part,  y,  is  to  be  used  to  purchase 
equipment  of  type  A,  and  the  remaining  amount,  x-y,  is  to  be  used 
to  purchase  equipment  of  type  B. 


I^t  us  assume,  for  simplicity,  that  if  we  spend  y  dollars  to 
purchaoe  A-equlpraent  there  are  two  possibilities. 

(l)  (a)  with  probability  pt  we  obtain  gi(y)  man— hoars  from 

the  equipment  and  retain  a  salvage  value  of  aiy  dollars, 
where  0  <  at  <  1. 

(b)  with  probability  1— pi  we  obtain  g#(y)  man-hours  from 

the  equipment  and  retain  a  salvage  value  of  aay  dollars, 
where  0  <  a*  <  1 . 

Similarly,  if  (x— y)  dollars  are  spent  *for  B— equipment,  we  have 
corresponding  probabilities  qt  and  1-qi,  functions  ht(x-y)  and 
n*(x-y)  and  parameters  b|  and  ba. 

At  the  start  of  each  year  we  repeat  the  process  with  the  new 
initial  amount  equal  to  the  sum  of  the  salvage  values. 

The  problem  is  to  determine  the  allocation  policy  which 
maximizes  the  total  expected  man— hours  obtained  ever  the  n  year 
period . 

§2.2  Optimal  Inventory 

At  various  specified  times  we  have  an  opportunity  to  order 
supplies  of  a  certain  set  of  items,  where  tne  cost  of  ordering  is 
some  function  of  the  amount  ordered  wr.ich  may  or  may  not  Include 
fixed  administrative  or  Hred  tape”  costs.  At  various  otner  times, 
demands  are  made  upon  the  stocks  of  tnese  items.  Tnese  demands 
are  stochastic  and  their  Joint  distribution  function  is  known. 

The  incentive  for  ordering  lies  In  a  penalty  which  is  assessed 
whenever  the  demand  of  an  item  exceeds  the  supply. 
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We  wish  to  determine  tne  ordering  policy  which  minimizes  seme 
average  cost. 

$2.}  Stocnastlc  Gold-Mining 

We  are  fortunate  enough  to  possess  two  gold  mines.  Anaconda  and 
Bonanza,  the  first  of  which  contains  an  amount,  x,  of  gold,  while 
the  second  possesses  an  amount,  y.  In  addition,  we  have  a  rather 
delicate  gold-mining  machine  which  has  the  property  tnat  If  used  to 
mine  gold  in  Anaconda  there  Is  a  probability  p»  that  It  will  mine  a 
fraction  rt  of  the  gold  there  and  remain  in  working  order,  and  a 
probability  (1— pi)  tnat  it  will  mine  no  gold  and  be  damaged  beyond 
repair.  Similarly,  Bonanza  has  associated  the  probabilities  p*  and 
(1— Pa)  and  the  fraction  r*. 

We  begin  by  using  tne  machine  in  either  the  Anaconda  or 
Bonanza  mine.  If  tne  machine  is  undamaged,  we  again  make  a  choice 
of  using  the  machine  in  either  of  the  two  mines,  and  continue  in 
tnls  way,  making  a  choice  before  each  mining  operation,  until  the 
machine  is  damaged. 

What  sequence  of  choices  maximizes  the  amount  of  gold  mined 
before  the  macnine  is  damaged? 

^2.4  Learning 

Let  us  assume  tnat  we  have  two  machines,  I  and  II,  with  the 
following  properties.  If  machine  I  is  used,  there  is  a  probability 
r  of  receiving  a  gain  of  one  unit  and  a  probability  (1— r)  of  re¬ 
ceiving  nothing.  If  machine  II  Is  used,  there  Is  a  corresponding 
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probability  s.  These  probabilities  are  not  known.  We  do,  however, 
possess  an  a  priori  probability  distribution  for  their  values. 

Given  a  fixed  number  of  trials,  the  problem  is  to  determine 
the  sequence  of  choices  which  maximizes  the  total  expected  return. 


52.5  A  Deterministic  Maximization  Problem 


We  wish  to  determine 
constraints 


n 

the  maximum  of  ^  ^x^ , 


subject  to  the 


(1)  (a)  >  0 

(b)  <  a, 

(c)  ^Q(x1)  <  b 

§3*  A  Modicum  of  Nomenclature 

Let  us  now  define  some  useful  terms.  We  are  considering  pro¬ 
cesses,  finite  or  infinite,  discrete  or  continuous,  which  require 
a  sequence  of  decisions.  We  consider  only  feasible  sequences, 
which  is  to  say  those  which  are  consistent  with  various  limitations 
and  constraints  which  may  be  imposed.  Every  feasible  sequence  of 
decisions  is  called  a  policy,  and  a  policy  which  maximizes  the 
"return"  of  the  process  is  called  an  optimal  policy.  By  the 
solution  of  a  problem,  we  mean  the  determination  of  all  optimal 
policies . 

In  order  to  define  precisely  the  "return"  of  a  process,  we 
must  Introduce  the  concept  of  a  state  variable  or  state  parameter . 

We  consider  a  system  S  of  economic,  industrial,  engineering  or  other 
source,  whose  physical  state  at  any  time  is  specified  by  a  set  of 
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quantltles ,  P.  In  many  cases,  P  is  an  N— dimensional  vector,  while 
in  more  complicated  situations,  P  may  consist  of  a  set  of  points 
and  functions,  and  Involve  functionals  as  well.  The  components  of 
P  are  called  the  state  variables. 

The  effect  of  a  decision  Is  to  transform  P  into  another  vector 
Pl.  Hence  a  decision  Is  equivalent  to  a  transformation  of  the  state 
variables.  £very  policy  then  yields  a  sequence  of  states 

(1)  Pi,  P«t  • • • »  Pn»  • • • 

We  now  introduce  a  criterion  function,  $(p),  measuring  the  value 
of  a  state  P,  ana  postulate  that  tne  purpose  In  carrying  out  tr.e 
process  is  to  maximize  this  function  of  the  final  state.  The  func¬ 
tion,  <$(Pp),  of  the  final  state  P~  is  called  the  return  of  the 
process,  and  write  ^(P)  to  denote  the  value,  <j(Pp),  obtained 
starting  in  a  state  P  and  using  a  policy  D. 

Finally,  we  define 

(2)  f(P)  =  Max  qD(P), 

D  u 

wr.ere  we  maximize  over  all  feaslole  policies.  The  policies,  D, 
which  yleid  the  return  f(P)  are  the  optimal  policies. 

$4.  The  Principle  of  Optimality 

We  now  characterize  optimal  policies  by  means  of  the  following 


lntui tive 


PRINCIPLE  OF  OPTIMALITY:  An  optimal  policy  has  the  ; roperty  that 
whatever  tne  Initial  state  and  Initial  decision  are,  the  regaining 
decisions  must  constitute  an  optimal  policy  wltn  regard  to  the  state 
resulting  from  the  first  decision. 

This  principle  Immediately  yields  the  functional  equation 

(1)  f(P)  -  Max  f(P(D)), 

D 

where  P(D)  Is  the  vector  resulting  from  a  choice  of  an  Initial 
decision  D. 

If  we  can  separate  P  Into  a  "space"  vector,  *,  and  a  "time" 
vector  T,  then  (l)  takes  the  form 

(2)  f(w,S+T)  -  Max  f(wD(S),T), 

D[0,Sj  U  J 

where  d[0,Sj  represents  a  sequence  of  decisions  over  the  time-interval 
lO.s]. 

Observe  that  the  form  of  the  equation  Is  the  same  regardless 
of  whether  we  are  dealing  with  a  deterministic  or  stocnastlc  situa¬ 
tion;  in  one  case  we  have  a  function  of  the  final  state,  in  tne 

other  case  an  average  of  functional  values.  Further  discussion  will 
be  found  in  [2j ,  j.5]  and  £llj. 

$5.  Functional  Equations 

Let  us  now  use  the  method  expounded  above  to  convert  the 
problems  discussed  previously  In  #2.1  —52.5  involving  policies 
Into  problems  concerning  the  solution  of  functional  equations. 
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Allocatlon 
Let  us  define 

fn^X/  =  exPected  return  In  manhours  obtained  over  an  N— stage 
period  starting  with  x-dollars  and  using  an  optimal 
policy 

Then 

fi(x)  =»  Max  l  Pi£i(y)  +  (l-Pi)g»(y)-'-qihi(x-yMl-q,)na(x— y)  j  , 
0<y<x 


and 

(3)  C  P»Q»  L  gi(y)^i(x-y)-^fn(aiy<-b|(x-y))  J 

+PaQ  i  L  «a(y)-*-ha(x-y)*-fn(a8y+bl(x-y))  J 
•*-piq»L  £»(y)^a  ^-y)-*-fn(a iy-*-ta(x-y) )  ] 
+Pa<la  C  £a(y)+ha  (x— y )  +fn (aay+ba (x— y ) )  J  . 
for  n  >  1.  see  ^1  j  ,  [2j,  [3],  [4 J  . 


#p.2  Op timal  Inventory 


Let  us  consider  for  the  sake  of  simplicity  a  process  involving 
the  stocking  of  one  item  where  we  may  order  at  each  of  a  finite 
number  of  equally  spaced  times,  and  we  must  fulfill  the  demand  at 
tnese  same  times.  We  assume  that  there  is  no  delay  in  filling  an 
order  or  a  demand. 


We  assume  that  we  know  completely  the  following  functions: 
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(1)  (a)  q(s)ds  =  the  probability  that  the  demand  will  lie  be¬ 

tween  s  and  s+ds. 

(b)  k(z)  »  the  cost  of  ordering  z  Items  Initially  to  Increase 

the  stock  level 

(c)  p(z)  =*  the  cost  of  ordering  z  Items  to  meet  an  excess, 

z,  of  demand  over  supply,  the  penalty  cost 

Let  us  define 

(2)  fn(x)  *  expected  total  cost  for  an  n— stage  process  starting 

with  an  initial  supply  x  and  using  an  optimal  ordering 
policy 


Then 

(3)  fi(x)  =*  Min  l  fc(y-«)  +1/^CO  p(s-y  )q(s)ds  ]  , 
y>x  y 

fn4>1(x)  "  Mln  Ck(y-x)  p(a-y)4(s)ds  +  fn(°)  4 (s )ds 

y>x 

fn(y-8)4(s)d8  J  • 

See  [2  j  ,  [7  j  . 

$9.3  Stochastic  Gold-Mining 
Let  us  define 

(1)  f(x,y)  »  expected  amount  of  gold  mined  before  tne  machine  Is 
damaged  when  A  has  x,  B  has  y  and  an  optimal  policy 
Is  employed 


Then  we  see  that  f(x,y)  satisfies  the  functional  equation 


(2)  f(x,y) 
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=  Max 


A: 

B: 


Pi  L  r  ix+f  ( (l— r  1 )x ,y )  j, 
Q»  L  (x, (l-r8)y)  Z 


II  we  wish  to  maximize  the  expected  value  of  some  function  of  the 
total  return,  R,  say  <^(R),  then  we  must,  in  general,  introduce 
another  state  variable,  a,  the  amount  of  gold  already  mined.  Let 
ua  define 


(3)  f(x,y,a)  *  expected  value  of  <*(R)  when  A  has  x,  B  has  y,  a 

has  already  been  mined  and  an  optimal  policy  is 
employed . 


Then  f(x,y,a)  satisfies  the  equation 


(*) 

f(x,y,a)  *  Max 

A : 

Pif ((i-ri)x,y,a*r»x)  + 

(1-Pi)<?(a), 

See 

C2].  I>J  »  L>lj  • 

B: 

_ 

qif (x, (l-r8)y,a+r8y )  + 

(1— q,Ma) 

§  p.A  Learning 


Let  us  consider  the  case  where  we  have  an  unbounded  set  of 

trials  and  the  return  on  the  k—  trial  is  discounted  by  a  factor 
k— 1 

a  ,  where  0  <  a  <  1.  Furthermore,  to  simplify  the  notation, 
let  us  treat  only  the  case  wnere  one  machine,  II,  has  a  known 
probability  of  success,  3,  and  the  other  machine,  I,  has  an 
a  priori  distribution  of  probabilities  of  success,  dF(r). 

Our  fundamental  assumption  is  that  the  new  a  priori  distri¬ 
bution  function  after  m  successes  and  n  failures  on  the  fir3t 


machine  is 
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(1) 


dPmn<r> 


rm( 1-r )ndF (r) 
r,7l(  l-r)ndF(r) 


On  tne  basis  of  this  assumption,  we  obtain  for  the  function 

(2)  fm  n(s)  »  expected  total  return  obtained  using  an  optimal 

policy  after  m  successes  and  n  failures  on  the  first 
macnlne , 


the  functional  equation 


(3)  -  Max 


l!  Ul'  rJFmn(r)  j[l  *  afn(.1>n(8)  ] 

+  iSo1  (1-r>jp11n(r)  jCaf,,n*l(el  j 

11:  s/ ( 1— a ) 


See  l9j  . 

£  -  . cj  A  Deterministic  Maximization  Problem 


Let  us  write 


n 


(l)  f  (a,b)  ■  the  maximum  value  of  ^  x,  subject  to  the  constraints 


n 


of  (2.^.1). 


Then  clearly  for  n  >  2 


(2)  fn(a,b)  =  Max  Qxt  +  fn-1 (a-F(x t ) , b-G (x » ) )  j 

(xj) 


where  X|  Is  bound  by  the  constraints 
(3)  0  <  x,  <  Min  [  p-^a),  G-1(b)  j. 


and 

(A)  f  1  (a  ,b )  -  Min  L  F”1  (a ) , 


P-5^9 

-13- 


Regardless  of  the  value  of  N  each  maximization  In  (2)  is  a  two— 
dimensional  problem  and  can  ce  programmed  for  computing  machines 
q-ite  easily. 

$6.  Successive  Approx lmatl one 

Generally  speaking,  tne  functional  equations  of  the  above 
sections  are  as  intractable  as  far  as  exact  solutions  are  concerned 
as  the  differential  equations  of  mathematical  pnyclcs,  enp lneerl ng 
or  metnematlcal  economics  are.  They  serve,  however,  two  useful 
purposes.  In  tne  first  place,  a  e.reat  many  structural  properties 
of  optimal  policies  can  be  deduced  from  simple  properties  of  the 
coefficient  functions  which  appear.  Secondly,  tne  metnod  of 
successive  approximations  can  be  used  to  good  effect  to  compute 
tne  maximum  return,  and  in  that  way  the  optimal  policies. 

In  tne  allocation  and  optimal  Inventory  problems  discussed 
above,  tne  U3e  of  successive  approximations  is  qjite  natural.  We 
are  merely  computing  tne  N— stage  return  for  n  -  1,  2,  ...  .  If 
n  is  large,  and  we  are  not  particularly  Interested  In  tne  results 
for  small  n,  tnls  metnod  of  approximation  Is  not  as  efficient  as 
others  we  can  aevise. 

One  particularly  useful  approximation  is  that  of  an  unbounded 

process.  Thus,  for  example,  In  (})  of  £5*2,  we  may,  if  n  Is  large 

replace  f  (x)  by  f(x)  -  lim  f  (x).  This  function  satisfies  the 
n  n — ►co 

equation 

(1)  f(x)  -  Max  lP»qi  L  £»  (y  — y  )+f  (a  »y+b  t  (x-y ) )  ] 

0<y<x 
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+l  aq  i  Z  t:»(y)+h  i  (x-y  )+f (a*y  i  (x-y  ) )  J 
♦P iQa  L  £i(y)+ha(x-y)+f (a ,y*C2(x-y ) )  “ 

+i  »Q2  L  ^a(y)^a(x-y)+l  (a2y-^b2(x-y ) )  3  3  ,  x  >  0, 

witi  f(0)  -  0. 

Tr.is  equation  nay  now  be  solved  by  t  re  met nod  c  l  success!  /e 
approximations,  ucinf:  not  the  criminal  ceq  <ence  {  (x )  3  ,  tne  se¬ 
quence  of  N— sta<  e  returns,  but  a  sequence  cnosen  to  approximate 
f^x)  more  closely  iron  tne  very  beginning. 

We  shall  discuss  a  simple  way  of  dcin.-  tr.is  In  the  next 
section . 

$ 7 .  Approximation  In  Policy  Space 

A  characteristic  ct'  ,  real  Importance  pc  ssessed  by  these 
dynamic  programming  processes  Is  the  duality  between  the  return 
function  and  the  policy  wnich  ylelas  tnls  functl.-n.  TMs  duality 
is  actually  Inherent  in  many  other  functional  equations,  but  net 
as  obviously,  cf.  [10],  [U]  ,  (123  *  &2J  • 

Let  us  consider  equation  (6.1)  tc  illustrate  c-r  re^ar^s.  We 
write  this  equation  In  tr.e  form 

(1)  f(x)  -  Max  T(r,y). 

0<y<x 

The  quantity  y  -  y(x),  the  allocation  wren  we  have  an  amount  x  of 
resources,  we  call  tne  policy  function  or  poll cy . 

Observe  tnat  tne  cnolce  of  a  policy  yields  a  return  function,  and 
conversely  11  we  nave  obtained  f(x),  explicitly  or  by  some  Iterative 
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tecnnique,  then  the  max imization  of  T(f,y)  yields  all  optimal 
value  of  y,  and  hence  all  optimal  policies. 

It  follows  tnat  we  nave  the  privilege  of  solving  the  problem 
by  determining  optimal  policies  or  maximum  returns.  The  method  we 
presented  above  was  approximation  In  function  space.  It  turns  out 
that  tne  alternative  method  of  approximation  in  policy  8 pace  possesses 
very  important  tneoretlcal  ana  computational  advantages. 

Let  us  discuss  tne  the< retlcal  advantages  llrst.  We  obtain  a 
first  approximation  oy  choosing  an  initial  policy  =  j'q(x).  Using, 
thin  policy,  we  compute  I 'q(x)  by  Iteration,  using  the  formula 

(2)  f0(x)  -  T(l0,y0). 

We  now  define 

(3)  f|(x)  -  Max  T  ( f ,, ,  y ) . 

0<y<x  u 

It  is  clear  that  f»(x)  >  f0(x).  Continuing  we  define 

(4)  rnui(x)  =  M;ix  T(f„»y)  *  n  >  1. 

°<ysx  " 

«  o 

Since  T(i',y)  Is  a  positive  operator,  f  »>l  Q  yields  :  ?>f  i  and  hence, 

inductively  f  ,  >  l  .  We  tnus  have  monotone  c  n/ergence.  The 
n  +• 1  —  n 

applications  of  this  concept  to  otner  f  lelos  s  -ch  as  tr.e  calculus  of 
variations  nave  been  discisscu  in  |_10| ,  [llj  ,  I12J  ,  • 

Let  us  now  discuss  the  practical  aspects.  Many  cf  trie  orocesses 
we  consider  nave  been  carried  out  for  some  time  in  the  real  world. 
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In  the  course  of  these  ye-.rs,  a  great  deal  ol'  experience  nas  teen 
gained  and  a  number  ol  techniques  have  been  devised  wr.lch  yield 
results  considerably  tetter  than  what  might  be  obtained  by  an  In¬ 
experienced  person. 

Coneeq  lently ,  In  all  these  cases  .we  ha  we  fair  appr -xl  ietlons 
In  policy  space,  even  tnou,,h  the  corresponding  approxi  mations  In 
function  space  may  be  com? letely  lacking . 
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